Entity-balanced Gaussian pLSA for Automated Comparison
نویسندگان
چکیده
Community created content (e.g., product descriptions, reviews) typically discusses one entity at a time and it can be hard as well as time consuming for a user to compare two or more entities. In response, we define a novel task of automatically generating entity comparisons from text. Our output is a table that semantically clusters descriptive phrases about entities. Our clustering algorithm is a Gaussian extension of probabilistic latent semantic analysis (pLSA), in which each phrase is represented in word vector embedding space. In addition, our algorithm attempts to balance information about entities in each cluster to generate meaningful comparison tables, where possible. We test our system’s effectiveness on two domains, travel articles and movie reviews, and find that entitybalanced clusters are strongly preferred by users.
منابع مشابه
Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis
Mixture models, such as Gaussian Mixture Model, have been widely used in many applications for modeling data. Gaussian mixture model (GMM) assumes that data points are generated from a set of Gaussian models with the same set of mixture weights. A natural extension of GMM is the probabilistic latent semantic analysis (PLSA) model, which assigns different mixture weights for each data point. Thu...
متن کاملComparison of Dimension Reduction Methods for Automated Essay Grading
Automatic Essay Assessor (AEA) is a system that utilizes information retrieval techniques such as Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), and Latent Dirichlet Allocation (LDA) for automatic essay grading. The system uses learning materials and relatively few teacher-graded essays for calibrating the scoring mechanism before grading. We performed a series o...
متن کاملA Descriptive Framework for the Field of Data Mining and Knowledge Discovery
s of forty-nine regular papers from PAKDD 2005 [Ho et al. 2005], which were not used in the framework building process, were collected and analyzed to see if they fit in the categories identified by grounded theory. The abstract of each article was analyzed to identify the primary objective(s) the author(s) are addressing. Take the article “Adjusting Mixture Weights of Gaussian Mixture Model vi...
متن کاملFace Detection with methods based on color by using Artificial Neural Network
The face Detection methodsis used in order to provide security. The mentioned methods problems are that it cannot be categorized because of the great differences and varieties in the face of individuals. In this paper, face Detection methods has been presented for overcoming upon these problems based on skin color datum. The researcher gathered a face database of 30 individuals consisting of ov...
متن کاملCorrelated PLSA for Image Clustering
Probabilistic Latent Semantic Analysis (PLSA) has become a popular topic model for image clustering. However, the traditional PLSA method considers each image (document) independently, which would often be conflict with the real occasion. In this paper, we presents an improved PLSA model, named Correlated Probabilistic Latent Semantic Analysis (C-PLSA). Different from PLSA, the topics of the gi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016